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Abstract —  Recent  research  has  shown  that  network  coding  can 
be  used  in  content  distribution  systems  to  improve  the  speed  of 
downloads  and  the  robustness  of  the  systems.  However,  such 
systems  are  very  vulnerable  to  attacks  by  malicious  nodes,  and 
we  need  to  have  a  signature  scheme  that  allows  nodes  to  check  the 
validity  of  a  packet  without  decoding.  In  this  paper,  we  propose 
such  a  signature  scheme  for  network  coding.  Our  scheme  makes 
use  of  the  linearity  property  of  the  packets  in  a  coded  system,  and 
allows  nodes  to  check  the  integrity  of  the  packets  received  easily. 
We  show  that  the  proposed  scheme  is  secure,  and  its  overhead 
is  negligible  for  large  files. 

I.  Introduction 

Network  coding  was  first  introduced  in  [1]  as  an  alternative 
to  the  traditional  routing  networks,  and  it  has  been  shown  that 
random  linear  coding  can  be  used  to  improve  the  throughput 
for  multicast  and  even  unicast  transmissions  [2],  [3],  [4]. 
More  recently,  several  researchers  explored  the  use  of  network 
coding  in  content  distribution  and  distributed  storage  systems 
[5],  [6].  Traditionally,  the  solutions  for  content  distribution 
are  based  on  a  client-server  model,  where  a  central  server 
sends  the  entire  file  to  each  client  that  requests  it.  This 
kind  of  approach  becomes  inefficient  when  the  file  size  is 
large  or  when  there  are  many  clients,  as  it  takes  up  a  large 
amount  of  bandwidth  and  server  resources.  In  recent  years, 
peer-to-peer  (P2P)  networks  have  emerged  as  an  alternative 
to  traditional  content  distribution  solutions  to  deliver  large 
files.  A  P2P  network  has  a  fully  distributed  architecture,  and 
the  peers  in  the  network  form  a  cooperative  network  that 
shares  the  resources,  such  as  storage,  CPU,  and  bandwidth, 
of  all  the  computers  in  the  network.  This  architecture  offers  a 
cost-effective  and  scalable  way  to  distribute  software  updates, 
videos,  and  other  large  files  to  a  large  number  of  users. 

The  best  example  of  a  P2P  cooperative  architecture  is  the 
BitTorrent  system  [7],  which  splits  large  files  into  small  blocks, 
and  after  a  node  downloads  a  block  from  the  original  server 
or  from  another  peer,  it  becomes  a  server  for  that  particular 
block.  Although  BitTorrent  has  become  extremely  popular 
for  distribution  of  large  files  over  the  Internet,  it  may  suffer 
from  a  number  of  inefficiencies  which  decrease  its  overall 


performance.  For  example,  scheduling  is  a  key  problem  in 
BitTorrent:  it  is  difficult  to  efficiently  select  which  block(s)  to 
download  first  and  from  where.  If  a  rare  block  is  only  found 
on  peers  with  slow  connections,  this  would  create  a  bottleneck 
for  all  the  downloaders.  Several  ad  hoc  strategies  are  used  in 
BitTorrent  to  ensure  that  different  blocks  are  equally  spread 
in  the  system  as  the  system  evolves.  References  [5],  [6] 
propose  the  use  of  network  coding  to  increase  the  efficiency 
of  content  distribution  in  a  P2P  cooperative  architecture.  The 
main  idea  of  this  approach  is  the  following  (see  Fig.  1).  The 
server  breaks  the  file  to  be  distributed  into  small  blocks,  and 
whenever  a  peer  requests  a  file,  the  server  sends  a  random 
linear  combination  of  all  the  blocks.  As  in  BitTorrent,  a  peer 
acts  as  a  server  to  the  blocks  it  has  obtained.  However,  in  a 
linear  coding  scheme,  any  output  from  a  peer  node  is  also 
a  random  linear  combination  of  all  the  blocks  it  has  already 
received.  A  peer  node  can  reconstruct  the  whole  file  when 
it  has  received  enough  degrees  of  freedom  to  decode  all  the 
blocks.  This  scheme  is  completely  distributed,  and  eliminates 
the  need  for  a  scheduler,  as  any  block  transmitted  contains 
partial  information  of  all  the  blocks  that  the  sender  possesses. 
It  has  been  shown  both  mathematically  [5]  and  through  live 
trials  [8]  that  the  random  linear  coding  scheme  significantly 
reduces  the  downloading  time  and  improves  the  robustness  of 
the  system. 

A  major  concern  for  any  network  coding  system  is  the 
protection  against  malicious  nodes.  Take  the  above  content 
distribution  system  for  example.  If  a  node  in  the  P2P  network 
behaves  maliciously,  it  can  create  a  polluted  block  with 
valid  coding  coefficients,  and  then  sends  it  out.  Here,  coding 
coefficients  refer  to  the  random  linear  coefficients  used  to 
generate  this  block.  If  there  is  no  mechanism  for  a  peer  to 
check  the  integrity  of  a  received  block,  a  receiver  of  this 
polluted  block  would  not  be  able  to  decode  anything  for  the 
file  at  all,  even  if  all  the  other  blocks  it  has  received  are  valid. 
To  make  things  worse,  the  receiver  would  mix  this  polluted 
block  with  other  blocks  and  send  them  out  to  other  peers,  and 
the  pollution  can  quickly  propagate  to  the  whole  network.  This 


1  -4244- 1 429-6/07/$25.00  ©2007  IEEE 


556 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

JUN  2007 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2007  to  00-00-2007 

4.  TITLE  AND  SUBTITLE 

Signatures  for  Content  Distribution  with  Network  Coding 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Massachusetts  Institute  of  Technology, Laboratory  for  Information  and 
Decision  Systems, Cambridge, MA, 02139 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

2007  IEEE  International  Symposium  on  Information  Theory  (ISIT2007),  Nice,  France,  24-29  June  2007. 


14.  ABSTRACT 

Recent  research  has  shown  that  network  coding  can  be  used  in  content  distribution  systems  to  improve  the 
speed  of  downloads  and  the  robustness  of  the  systems.  However,  such  systems  are  very  vulnerable  to 
attacks  by  malicious  nodes,  and  we  need  to  have  a  signature  scheme  that  allows  nodes  to  check  the  validity 
of  a  packet  without  decoding.  In  this  paper,  we  propose  such  a  signature  scheme  for  network  coding.  Our 
scheme  makes  use  of  the  linearity  property  of  the  packets  in  a  coded  system,  and  allows  nodes  to  check  the 
integrity  of  the  packets  received  easily.  We  show  that  the  proposed  scheme  is  secure,  and  its  overhead  is 
negligible  for  large  files. 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

ABSTRACT 

18.  NUMBER 

OF  PAGES 

19a.  NAME  OF 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

5 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


ISIT2007,  Nice,  France,  June  24  -  June  29,  2007 


QD 


Fig.  1.  Content  distribution  with  network  coding.  Assume  the  file  being 
distributed  is  broken  into  three  blocks,  PI,  P2,  and  P 3.  Any  packet  being 
transmitted  is  a  random  linear  combination  of  all  the  blocks  the  sender  has. 
For  example,  the  packet  sent  from  the  source  to  peer  A  is  a  combination  of 
PI,  P 2,  and  P 3,  whereas  the  packet  sent  from  peer  A  to  D  is  a  combination 
of  blocks  A1  and  A2.  A  peer  is  able  to  decode  the  whole  file  when  it  receives 
3  linearly  independent  blocks. 


makes  coding  based  content  distribution  even  more  vulnerable 
than  the  traditional  P2P  networks,  such  as  Bit  Torrent.  Similar 
security  problems  arise  in  all  systems  that  use  network  coding, 
such  as  multicast  networks.  Several  attempts  were  made  to 
address  this  problem.  Ho  et  al  introduced  Byzantine  modi¬ 
fication  detection  in  multicast  network  with  random  network 
coding  [9].  They  added  a  simple  polynomial  hash  value  into 
each  packet,  and  a  receiver  node  can  detect  the  presence  of 
a  Byzantine  attacker  with  high  probability,  given  that  the 
attacker  is  unable  to  design  and  supply  modified  packets 
with  complete  knowledge  of  other  packets  received  by  other 
nodes.  Jaggi  et  al  [10]  proposed  a  distributed  network  coding 
scheme  for  multicast  network  that  is  resilient  in  the  presence 
of  Byzantine  adversaries.  They  view  the  adversarial  nodes  as 
a  second  source,  and  judiciously  add  redundancy  at  the  real 
source  to  help  the  receivers  distill  out  the  source  information 
horn  the  received  mixtures.  References  [5],  [11]  proposed 
to  use  homomorphic  hash  functions  in  content  distribution 
systems  to  detect  polluted  packets,  and  [12]  suggested  the  use 
of  a  Secure  Random  Checksum  (SRC)  which  requires  less 
computation  than  the  homomorphic  hash  function.  However, 
[12]  requires  a  secure  channel  to  transmit  the  SRCs  to  all 
the  nodes  in  the  network.  Charles  et  al  [13]  proposed  a 
signature  scheme  for  network  coding  that  does  not  require  such 
a  secure  channel  for  transmitting  hash  values  and  associated 
digital  signatures  of  received  and  transmitted  blocks.  This 
signature  scheme  is  based  on  Weil  pairing  on  elliptic  curves 
and  provides  authentication  of  the  data  in  addition  to  pollution 
detection,  but  the  computation  complexity  of  this  solution  is 
quite  high.  Moreover,  the  security  offered  by  elliptic  curves 
that  admit  Weil  pairing  is  still  a  topic  of  debate  in  the  scientific 
community. 

In  this  paper,  we  propose  a  new  signature  scheme  that  is 
not  based  on  elliptic  curves,  and  is  designed  specifically  for 
random  linear  coded  systems.  In  this  scheme,  we  view  all 


blocks  of  the  file  as  vectors,  as  in  any  network  coding  scheme, 
and  make  use  of  the  fact  that  all  valid  vectors  transmitted  in 
the  network  should  belong  to  the  subspace  spanned  by  the 
original  set  of  vectors  from  the  file.  We  design  a  signature 
that  can  be  used  to  easily  check  the  membership  of  a  received 
vector  in  the  given  subspace,  and  at  the  same  time,  it  is  hard 
for  a  node  to  generate  a  vector  that  is  not  in  that  subspace  but 
passes  the  signature  test.  We  show  that  this  signature  scheme 
is  secure,  and  that  the  overhead  for  the  scheme  is  negligible 
for  large  files. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  II, 
we  describe  the  setup  of  the  problem,  and  introduce  notations 
that  will  be  used  throughout  this  paper.  We  present  the  new 
signature  scheme  in  Section  III  and  prove  that  it  is  secure. 
Overheads  and  other  aspects  of  the  scheme  are  discussed  in 
Section  IV,  and  finally,  the  paper  is  concluded  in  Section  V. 

II.  Problem  Setup 

In  this  section,  we  introduce  the  framework  for  a  random 
linear  coding  based  content  distribution  system.  This  frame¬ 
work  can  also  be  easily  modified  to  be  used  for  distributed 
storage  systems.  We  model  the  network  by  a  directed  graph 
Gd  =  (N,  A),  where  N  is  the  set  of  nodes,  and  A  is  the  set 
of  communication  links.  A  source  node  s  e  N  wishes  to  send 
a  large  file  to  a  set  of  client  nodes,  T  C  N.  In  this  paper,  we 
refer  to  all  the  clients  as  peers.  The  large  file  is  divided  into  to 
blocks,  and  any  peer  receives  different  blocks  from  the  source 
node  or  from  other  peers.  In  this  framework,  a  peer  is  also 
a  server  to  blocks  it  has  downloaded,  and  always  sends  out 
random  linear  combinations  of  all  the  blocks  it  has  obtained  so 
far  to  other  peers.  When  a  peer  has  received  enough  degrees 
of  freedom  to  decode  the  data,  i.e.,  it  has  received  to  linearly 
independent  blocks,  it  can  re-construct  the  whole  file. 

Specifically,  we  view  the  m  blocks  of  the  file, 
as  elements  in  n-dimensional  vector  space  F”,  where  p  is 
a  prime.  The  source  node  augments  these  vectors  to  create 
vectors  Vi,...,  vm,  given  by 

where  the  first  m  elements  are  zero  except  that  the  zth  one  is 
1,  and  Vij  G  ¥p  is  the  yth  element  in  Vj.  Packets  received  by 
the  peers  are  linear  combinations  of  the  augmented  vectors, 

m 

W  =7>Vj> 
i=  1 

where  /?;  is  the  weight  of  Vj  in  w.  We  see  that  the  additional 
to  elements  in  the  front  of  the  augmented  vector  keeps  track 
of  the  (3  values  of  the  corresponding  packet,  i.e., 

W  =  (/?!,. 

where  (w;i,  ...,Wjn)  is  the  payload  part  of  the  packet,  and 
/?m)  is  the  code  vector  that  is  used  to  decode  the 

packets. 

As  mentioned  in  the  previous  section,  this  kind  of  network 
coding  scheme  is  vulnerable  to  pollution  attacks  by  malicious 
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nodes  [14],  [15],  and  the  pollution  can  quickly  spread  to  other 
parts  of  the  network  if  the  peer  just  unwittingly  mixes  this  pol¬ 
luted  packet  into  its  outgoing  packets.  Unlike  uncoded  systems 
where  tire  source  knows  all  the  blocks  being  transmitted  in  the 
network,  and  therefore,  can  sign  each  one  of  them,  in  a  coded 
system,  each  peer  produces  “new”  packets,  and  standard  digital 
signature  schemes  do  not  apply  here.  In  the  next  section,  we 
introduce  a  novel  signature  scheme  for  the  coded  system. 


III.  Signature  scheme  for  network  coding 


We  note  that  the  vectors  Vi . vm  span  a  subspace  V  of 

F”+”,  and  a  received  vector  w  is  a  valid  linear  combination  of 
vectors  Vi , ...,  vm  if  and  only  if  it  belongs  to  the  subspace  V. 
This  is  the  key  observation  for  our  signature  scheme.  In  the 
scheme  described  below,  we  present  a  system  that  is  based 
upon  standard  modulo  arithmetic  (in  particular  the  hardness 
of  the  Discrete  Logarithm  problem)  and  upon  an  invariant 
signature  a (V)  for  the  linear  span  V .  Each  node  verifies  the 
integrity  of  a  received  vector  w  by  checking  the  membership 
of  w  in  V  based  on  the  signature  &(V). 

Our  signature  scheme  is  defined  by  the  following  ingredi¬ 
ents,  which  are  independent  of  the  file(s)  to  be  distributed: 

•  q:  a  large  prime  number  such  that  p  is  a  divisor  of  q—  1. 
Note  that  standard  techniques,  such  as  that  used  in  Digital 
Signature  Algorithm  (DSA),  apply  to  find  such  q. 

•  g:  a  generator  of  the  group  G  of  order  p  in  F?.  Since  the 
order  of  the  multiplicative  group  F*  is  q  —  1,  which  is  a 
multiple  of  p,  we  can  always  find  a  subgroup,  G,  with 
order  p  in  F*. 

t  Private  key:  K pr  =  {Qj}j=i)111)m+n?  a  random  set  of 
elements  in  F*.  Kpr  is  only  known  to  the  source. 

.  Public  key:  Kpu  =  {hi  =  sr“i}i=i),.,>m+n.  K-pU  is 
signed  by  some  standard  signature  scheme,  e.g.,  DSA, 
and  published  by  the  source. 

To  distribute  a  file  in  a  secure  manner,  the  signature  scheme 
works  as  follows. 

1)  Using  the  vectors  Vi,...,vm  from  the  file,  the  source 
finds  a  vector  u  =  (uj,  ...,um+n)  G  F^+n  orthogonal 
to  all  vectors  in  V.  Specifically,  the  source  finds  a  non¬ 
zero  solution,  u,  to  the  equations 


Vj  •  u  =  0,  *  =  1,  ...,m. 


2) 

3) 

4) 

5) 


The  source  computes  vector  x 


(u1/a1,u2/a2,..., 


The  source  signs  x  with  some  standard  signature  scheme 
and  publishes  x.  We  refer  to  the  vector  x  as  the 
signature,  cr(V),  of  the  file  being  distributed. 

The  client  node  verifies  that  x  is  signed  by  the  source. 
When  a  node  receives  a  vector  w  and  wants  to  verify 
that  w  is  in  V,  it  computes 


m-\-n 

d=  n  hTw,> 

i=  1 


and  verifies  that  d  =  1. 


To  see  that  d  is  equal  to  1  for  any  valid  w,  we  have 

m-\-n 

d  =  n  *r* 

z=  1 
m+n 

_  ^gOii^UiWi / Oii 

i=  1 
m+n 

=  n  9u'm 

=  gZZ l"  (“>“«) 

=  1, 

where  the  last  equality  comes  from  the  fact  that  u  is  orthogonal 
to  all  vectors  in  V. 

Next,  we  show  that  the  system  described  above  is  secure.  In 
essence,  the  theorem  below  shows  that  given  a  set  of  vectors 
that  satisfy  the  signature  verification  criterion,  it  is  provably 
as  hard  as  the  Discrete  Logarithm  problem  to  find  new  vectors 
that  also  satisfy  the  verification  criterion  other  than  those  that 
are  in  the  linear  span  of  the  vectors  already  known. 
Definition  1.  Let  p  be  a  prime  number  and  G  be  a  multi¬ 
plicative  cyclic  group  of  order  p.  Let  k  and  n  be  two  integers 
such  that  k  <  n,  and  T  =  {hi,...,hn{  be  a  set  of  generators 
of  G.  Given  a  linear  subspace,  V,  of  rank  k  in  F£  such  that 
for  every  v  G  V,  the  equality  Tv  =  ]  Jjlj  h V*  =  1  holds,  we 
define  the  (p,fc,n)-Diffie-Hellman  problem  as  the  problem  of 
finding  a  vector  w  G  F”  with  Tw  =  1  but  w  ^  V. 

By  this  definition,  the  problem  of  finding  an  invalid  vector 
that  satisfies  our  signature  verification  criterion  is  a  (p,  m,m+ 
n)-Diffie-Hellman  problem.  Note  that  in  general,  the  ( p,n  — 
l,n)-Diffie-Hellman  problem  has  no  solution.  This  is  because 
if  V  has  rank  n  —  1  and  a  w1  exists  such  that  Tw  =  1 
and  w'  /  V,  then  w'  +  V  spans  the  whole  space,  and  any 
vector  w  G  F”  would  satisfy  Tw  =  1.  This  is  clearly  not  true, 
therefore,  no  such  w'  exists. 

Theorem  1.  For  any  k  <  n  —  1,  the  (p,  fc,n)-Difiie-Hellman 
problem  is  as  hard  as  the  Discrete  Logarithm  problem. 

Proof:  Assume  that  we  have  an  efficient  algorithm  to 
solve  the  (p,  fc,n)-Diffie-Hellman  problem,  and  we  wish  to 
compute  the  discrete  algorithm  log9(2)  for  some  z  =  gx, 
where  g  is  a  generator  of  a  cyclic  group  G  with  order  p. 
We  can  choose  two  random  vectors  r  =  (ri,...,r„)  and 
s  =  (si,...,sn)  in  Fp,  and  construct  T  =  {fti,  where 

hi  =  zTigSi  for  i  =  1,  ...,n.  We  then  find  k  linearly  indepen¬ 
dent  (and  otherwise  random)  solution  vectors  vj, ...,  v&  to  the 
equations 

v  ■  r  =  0  and  v  •  s  =  0. 

Note  that  there  exist  n— 2  linearly  independent  solutions  to  the 
above  equations.  Let  V  be  the  linear  span  of  {vi , ...,  v;.},  it  is 
clear  that  any  vector  v  G  V  satisfies  Tv  =  1.  Now,  if  we  have 
an  algorithm  for  the  (p,  fejUj-Diffie-Hellman  problem,  we  can 
find  a  vector  w  ^  V  such  that  Tw  =  1.  This  vector  would 
satisfy  w  ■  (air  +  s)  =  0.  Since  r  is  statistically  independent 
from  (air  +  s),  with  probability  greater  than  1  —  1/p,  we  have 
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w  ■  r  ^  0.  In  this  case,  we  can  compute 


logsO)  =x  = 


w  ■  s 
w  •  r 


This  means  the  ability  to  solve  the  (p>fc>n)-Diffie-Hellman 
problem  implies  the  ability  to  solve  the  Discrete  Logarithm 
problem.  ■ 

This  proof  is  an  adaptation  of  a  proof  that  appeared  in  an 
earlier  publication  by  Boneh  et.  al  [16]. 


IV.  Discussion 


Our  signature  scheme  nicely  makes  use  of  the  linearity 
property  of  random  linear  network  coding,  and  enables  the 
peers  to  check  the  integrity  of  packets  without  the  requirement 
for  a  secure  channel,  as  in  the  case  of  hash  function  or  SRC 
schemes  [5],  [11],  [12],  Also,  the  computation  involved  in  the 
signature  generation  and  verification  processes  is  very  simple. 

Next,  we  examine  the  overhead  incurred  by  this  signature 
scheme.  Let  the  file  size  be  M  and  let  the  file  be  divided  into 
m  blocks,  each  one  of  which  is  a  vector  in  F£.  The  size  of 
each  block  is  B  =  n  log (p)  and  we  have  M  =  mn  log(p) . 
The  size  of  each  augmented  vector  (with  coding  vectors  in 
the  front)  is  Ba  =  (m  +  n)  log(p),  and  thus,  the  overhead  of 
the  coding  vector  is  m/n  times  the  file  size.  Note  that  this 
is  the  overhead  pertaining  to  the  linear  coding  scheme,  not 
to  our  signature  scheme,  and  any  practical  network  coding 
system  would  make  m  <^.n.  The  initial  setup  of  our  signature 
scheme  involves  the  publishing  of  the  public  key,  Kpu,  which 
has  size  (m  +  n)log(q').  In  typical  cryptographic  applications, 
the  size  of  p  is  20  bytes  (160  bits),  and  the  size  of  q  is  128 
bytes  (1024  bits),  thus,  the  size  of  Kpu  is  approximately  equal 
to  6 (to  +n)/mn  times  the  file  size. 

For  distribution  of  each  file,  the  incremental  overhead  of 
our  scheme  consists  of  two  parts:  the  public  data,  Kpu,  and 
the  signature  vector,  x. 

For  the  public  key,  Kpu,  we  note  that  it  cannot  be  fully 
reused  for  multiple  files,  as  it  is  possible  for  a  malicious  node 
to  generate  a  invalid  vector  that  satisfies  the  check  d  =  1 
using  information  obtained  from  previously  downloaded  files. 
Specifically,  let  Xi  be  the  signature  of  File  1,  and  Wj  be  a 
valid  received  vector  for  File  1,  we  have 

m+n 

d=  h*uWli  =  1. 

i=  1 

If  the  source  then  distribute  File  2  using  the  same  public 
key,  K pu,  and  a  different  signature,  x2,  a  malicious  node 
can  construct  a  vector  w2,  where  w2j  =  XuWu/x2i,  which 
satisfies  the  signature  check 


m+n 

d  =  ti°2iW2i 


i=l 


m+n 

]]  =  l. 

i=l 


However,  w2  is  not  a  valid  linear  combination  of  the  vectors 
of  File  2.  To  prevent  this  from  happening,  we  can  publish  a 
public  key  for  each  file,  and  as  mentioned  above,  the  overhead 
is  about  6 (m  +  n)/mn  times  the  file  size,  which  is  small  as 
long  as  6  <m<«.  Note  that  if  we  republish  KpU  for  every 


new  file,  we  can  reuse  the  signature  vector  x.  Let  u2  be  a 
vector  that  is  orthogonal  to  all  vectors  in  File  2,  the  source 
can  compute  a  new  private  key,  Kpr  =  {an,  ...,am+n},  given 
by 

&i=U2i/xi>  + 

The  source  then  publishes  the  new  public  key,  KP„  =  {hi  = 
In  this  way,  we  do  not  need  to  publish  new 
x  vectors  for  the  subsequent  files. 

Alternatively,  for  every  new  file,  we  can  randomly  pick  an 
integer  i  between  1  and  m+n,  select  a  new  random  value 
for  ctj  in  the  private  key,  and  publish  the  new  hi  =  </“*.  The 
overhead  for  this  method  is  ( m  +  n )  times  smaller  than  that 
described  in  the  previous  paragraph,  i.e.,  this  overhead  is  only 
6/mn  times  the  file  size.  As  an  example,  if  we  have  a  tile 
of  size  10MB,  divided  into  m  =  100  blocks,  the  value  of  n 
would  be  in  the  order  of  thousands,  and  thus,  this  overhead  is 
less  than  0.01%  of  the  file  size.  This  method  should  provide 
good  security  except  in  the  case  where  we  expect  the  vector  w 
to  have  low  variability,  for  example,  has  many  zeros.  Security 
can  be  increased  by  changing  more  elements  in  the  private  key 
for  each  new  file. 

However,  if  we  only  change  one  element  in  the  public 
key,  for  each  new  file  distributed,  we  also  have  to  publish 
a  new  signature  x,  which  is  computed  from  a  vector  u  that 
is  orthogonal  to  the  subspace  V  spanned  by  the  file.  Since 
the  V  has  dimension  m,  it  is  sufficient  to  only  replace  m 
elements  in  u  to  generate  a  vector  orthogonal  to  the  new  file. 
Since  the  first  m  elements  in  the  vectors  Vi , ... ,  are  always 
linearly  independent  (they  are  the  code  vectors),  it  suffices  to 
just  modify  the  entries  Ui  to  um.  Assume  that  the  ith  element 
in  the  private  key  is  the  only  one  that  has  been  changed  for 
the  distribution  of  the  new  file,  and  that  i  is  between  1  and  m, 
then  we  only  need  to  publish  x\  to  xm  for  the  new  signature 
vector.  This  part  of  the  overhead  has  size  mlog(p),  and  the 
ratio  between  this  overhead  and  the  original  file  size  iV  is  1  /n. 
Again,  take  a  10MB  file  for  example,  this  overhead  is  less  than 
0. 1%  of  the  file  size. 

Therefore,  after  the  initial  setup,  each  additional  file  dis¬ 
tributed  only  incurs  a  negligible  amount  of  overhead  using 
our  signature  scheme. 

Finally,  we  would  like  to  point  out  that,  under  our  assump¬ 
tions  that  there  is  no  secure  side  channel  from  the  source  to 
all  the  peers  and  that  the  public  key  is  available  to  all  the 
peers,  our  signature  scheme  has  to  be  used  on  the  original 
file  vectors  not  on  hash  functions.  This  is  because  to  maintain 
the  security  of  the  system,  we  need  to  use  a  one-way  hash 
function  that  is  homomorphic,  however,  we  are  not  aware 
of  any  such  hash  function.  Although  [5]  and  [11]  suggested 
usage  of  homomorphic  hash  functions  for  network  coding, 
[5]  assumed  that  the  intermediate  nodes  do  not  know  the 
parameters  used  for  generating  the  hash  function,  and  [11] 
assumed  that  a  secure  channel  is  available  to  transmit  the  hash 
values  of  all  the  blocks  from  the  source  node  to  the  peers. 
Linder  our  more  relaxed  assumptions,  these  hash  functions 
would  not  work. 
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V.  Conclusions 

Security  problem  is  a  main  obstacle  in  the  implementation 
of  content  distribution  networks  using  random  linear  network 
coding.  To  tackle  this  problem,  instead  of  trying  to  fit  an 
existing  signature  scheme  to  network  coding  based  systems, 
in  this  paper,  we  proposed  a  new  signature  scheme  that  is 
made  specifically  for  such  systems.  We  introduced  a  signature 
vector  for  each  file  distributed,  and  the  signature  can  be  used 
to  easily  check  the  integrity  of  all  the  packets  received  for  this 
file.  We  have  shown  that  the  proposed  scheme  is  as  hard  as  the 
Discrete  Logarithm  problem,  and  the  overhead  of  this  scheme 
is  negligible  for  a  large  file. 
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